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Abstract 



We offer a proof of the following nonconventional ergodic theorem: 

Theorem. If Ti : II r\ {X,T,, fi) for i = 1,2, ... ,d are commuting 
probability-preserving Z'^' -actions, {In)n>i is a F0lner sequence of subsets 
ofU, {aN)N>i is a base-point sequence in IT and fi, /2, • • • , /cZ G 
then the nonconventional ergodic averages 



converge to some limit in L^{n) that does not depend on the choice of 
{aN)N>i or {In)n>i- 

The leading case of this result, with r = 1 and the standard sequence of 
averaging sets, was first proved by Tao in |[T6l . following earlier analyses of 
various more special cases and related results by Conze and Lesigne ||4i|5i|6J, 
Furstenberg and Weiss [9|, Zhang lITSl . Host and Kra (T2[ [T3l . Frantziki- 
nakis and Kra Q and Ziegler fT9l . While Tao's proof rests on a conversion 
to a finitary problem, we invoke only techniques from classical ergodic the- 
ory, so giving a new proof of his result. 



1 Introduction 

The setting for this work is a collection of d commuting measure-preserving ac- 
tions Ti : U' r\ [X, S, /i), % = 1, 2, . . . , d, on a probability space. We present a 
proof of the following result: 
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Theorem 1.1 (Convergence of multidimensional nonconventional ergodic aver- 
ages). If Ti : 7/ r\ (X, E, /i) for i = 1,2, ... ,d are commuting probability- 
preserving II' -actions, {In) n>i is a F0lner sequence of subsets ofU', (aN) n>i is 
a base-point sequence in 711 and /i, /2, • • • , /d G U^i^y^ then the nonconventional 
ergodic averages 



converge to some limit in I^i^^ that does not depend on the choice ofia^) n>i or 



The case of this result with r = 1 and the standard sequence of averaging sets 
In + ttN '■= {1,2,..., N} was first proved by Tao in [|16il . Tao proceeds by first 
demonstrating the equivalence of this result with a finitary assertion about the be- 
haviour of the restriction of our functions to large finite pieces of individual orbits. 
This, in turn, is easily seen to be equivalent to a purely finitary result about the 
behaviour of certain sequences of averages of 1 -bounded functions on (Z/NZY 
for very large N, and the bulk of Tao's work then goes into proving this last result. 
Interestingly, Towsner has shown in [17J how the asymptotic behaviour of these 
purely finitary averages can be re-interpreted back into an ergodic-theoretic asser- 
tion by building a suitable 'proxy' probability-preserving system from these av- 
erages themselves, using constructions from nonstandard analysis. Tao's method 
of analysis can be extended to the case of individual actions Tj of a higher-rank r 
and an arbitrary F0lner sequence in Z'', but with the base-point shifts all zero, 
quite straightforwardly, but seems to require more work in order to be extended to 
a proof for the above base-point-uniform version. 

In this paper we shall give a different proof of Theorem 1 1.1 1 that uses only more 
traditional infinitary techniques from ergodic theory. Our method is not affected 
by shifting the base points of our averages. In particular, we recover a new proof 
of the base-point-fixed case. 

The further special case of Theorem 11.11 in which r = 1 and Tj = T*^' for some 
fixed invertible probability-preserving transformation T and sequence of integers 
ai, a2, . . . , ad has been the subject of considerable recent attention, with complete 
proofs of this case appearing in works of Host and Kra fT3^ and of Ziegler [201. 
These, in turn, build on techniques developed in previous papers for this or other 
special cases of the theorem by Conze and Lesigne [|4l[5l[6l, Zhang [[T8l and Host 
and Kra [fT2l . and also on the analysis by Furstenberg and Weiss in {9] of averages 
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of the form J2n=i f ' 9 ° (which, we stress, do not constitute a special 
case of Theorem ll.ll in view of the nonlinearity in the second exponent). 

It is this last paper that first formally introduces the important notion of 'charac- 
teristic factors' for a system of averages of products: in our general setting, these 
comprise a tuple (Si, S2, . . . , S^) of T-invariant cr-subalgebras of S such that, 
firstly, 

in L'^in) as ^ 00 for any /i, /2, • • • , /d G -^°°(/^) and any choice of {a^) n>i 
and {In)n>i, so that convergence in general will follow if it can be established 
when each fi is Sj -measurable; and secondly such that these factors have a more 
precisely-describable structure than the overall original system, so that the asymp- 
totic behaviour of the right-hand averages above can be analyzed explicitly. 

This proof-scheme has not yet been successfully carried out in the general setting 
of the present paper. The analyses of powers of a single transformation by Host 
and Kra and by Ziegler both rely on achieving a very precise classification of all 
possible characteristic factors in the form of 'nilsystems', within which setting 
a bespoke analysis of the convergence of the relevant ergodic averages has been 
carried out separately by Leibman llT4l . In addition, Frantzikinakis and Kra have 
shown in [7J that nilsystems re-appear in this role in the case of a more general 
collection of invertible single transformations Tj under the assumption that each Tj 
and each difference TiT^^ for i 7^ j is ergodic, and they deduce the restriction of 
Theorem 1 1.1 1 to this case also. However, without this extra ergodicity hypothesis 
simple examples show that any tuple of characteristic factors for our system must 
be much more complicated, and no good description of such a tuple is known. 

We note in passing that in the course of their analysis in [11311 of the case of powers 
of a single transformation. Host and Kra also introduce the following 'cuboidal' 
averages associated to a single action S -.Tj'' r\ (X, S, /i): 

^ ^ ^ j Q (^riini+r]2n2 + ...+r)rnr 

Using their structural results they are able to prove convergence of these averages 
also. This result amounts to a different instance of our Theorem 1 1.1[ involving 2^ 
commuting Z'^-actions, by defining T" := 5'?ini+r;2n2+...+r,.n._ 
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In this paper we shall use the possibility of projecting our input functions fi onto 
special factors only in a rather softer way than in the works above. Noting that the 
case d = 1 of Theorem 11.11 amounts simply to the von Neumann mean ergodic 
theorem, we shall show that, ifd> 2, and under the assumption that Theorem ll.il 
holds for collections of — 1 commuting Z'' -actions, then from an arbitrary Z'^- 
system {X, T., fj.,T) we can always construct an extension {X ,T., jl,T) and then a 
factor E of that extension such that, interpreting our nonconventional averages as 
living inside the larger system X, we may replace the first function fi with its pro- 
jection E^J/i I S] in the evaluation of these averages, and this projection is then of 
such a form that our nonconventional averages can be immediately approximated 
by nonconventional averages involving only d — 1 actions. From this point a proof 
of Theorem 1 1 . 1 1 folio w s quickly by induction on d. 

It is interesting to note that this overall scheme of building an extension to a system 
with a certain additional property and then showing that this enables us to project 
just one of the functions contributing to our nonconventional averages onto a spe- 
cial factor of that extension is the same as that followed by Furstenberg and Weiss 
in [|9l. However, the demands they make on their extension and the ways in which 
they then exploit it are very different from ours, and at the level of finer detail 
there seems to be no overlap between the proofs. 

In fact, the resulting proof of convergence is much more direct than those pre- 
viously discovered for the case of powers of a single transformation (in addition 
to avoiding Tao's conversion to a finitary problem). This is possibly not so sur- 
prising: the construction we use to build our extended system (X, S, jl, T) will 
typically not respect any additional algebraic structure among the transformations 
Tj. Even if these are powers of a single transformation, in general the Tj will not 
be, and thus as far as our proof is concerned this extra assumption lends us no 
advantage. This is symptomatic of an important price that we pay in following 
our shorter proof: unlike Host and Kra and Ziegler, we obtain essentially no ad- 
ditional information about the final form that our nonconventional averages take. 
We suspect that substantial new machinery will be needed in order to describe 
these limits with any precision. 

Finally, let us take this opportunity to stress that the substructures of a system 
(X, S, /i, T) that are responsible for this complexity in the analysis of nonconven- 
tional analysis, although complicated and difficult to describe, are in a sense very 
rare. This heuristic is made precise in the following observation: if the action T 
is chosen generically (using the coarse topology on the collection of probability- 
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preserving actions on a fixed Lebesgue space (X, E, /x), say), then classical argu- 
ments (see, for example. Chapter 8 of Nadkarni [[T5l ) show that generically every 
is individually weakly mixing, and in this case not only can our averages be 
shown to converge using a rather shorter argument (due to Bergelson in [IJ), but 
they converge simply to the product of the separate averages, Y\l=i fx 
should like to propose a view of the present paper as a contribution to understand- 
ing those rare, specially structured ways in which the averages associated to our 
system can deviate from this 'purely random' behaviour. 

Acknowledgements My thanks go to Vitaly Bergelson, John Griesmer, Bernard 
Host, Bryna Kra, Terence Tao and Tamar Ziegler for several helpful discussions 
and communications and to David Fremlin and an anonymous referee for several 
constructive suggestions for improvement. 

2 Some preliminary definitions and results 

Our interest in this paper is with a probability-preserving system T : 77''^ r\ 
(X, for which we we will always assume that the underlying measurable 

space is standard Borel. Inside Z'"^ we distinguish the subgroups Vi := 77 x 
|Q|r(d-i)^ := {0}'' X Z'- X {O}''^'^-^), ...and Td := {O}^^'^-^) x Z^ Each 
of these is canonically isomorphic to 77 when written as a Cartesian product, as 
here, and we write ctj : Z' Fj for these isomorphisms. We identify the restric- 
tions T\y^,T\y2, ■ ■ ■ ,T\y^ with the individual Z*" actions T"'^ ' \ and denote them 
by Ti, T2, . . . , Trf respectively. Note that, in this setting of group actions, all of 
our transformations are implicitly invertible; routine arguments easily recover ver- 
sions of Theorem 11.11 suitable for collections of commuting non-invertible trans- 
formations. We shall sometimes denote a probability-preserving system alterna- 
tively by (X,S,/i,T). 

We shall also handle several /x-complete T-invariant cr- subalgebras of S. As is 
a standard in ergodic theory we shall use the term factor either for such a cr- 
subalgebra or for a probability-preserving intertwining map </> : (X, S, yU, T) — »• 
(F, H, S); to any such we can associate the invariant cr-subalgebra given by 
the yu-completion of </)^^[H] inside S. Henceforth we shall abusively write 
for this completed cr- algebra. 

In particular, within our system we can identify the invariant factor comprising 
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all A G S such that fi(T{A)AA) = 0. This naturally inherits a Z^'^^-action from 
the original system. We shall denote it by S^. More generally, if F is a subgroup 
of Z^'^, we can identify the factor left invariant by {T'^ : 7 G F}: extending the 
above notation, we shall call this the T|r-isotropy factor and write it S^l^". We 
shall frequently refer to this factor in case F is the subgroup {ai{n) — aj (n) : n E 
Z''} for some i 7^ j, in which case we write S^'^^j in place of s^'™'"*""^' . It will 
be centrally important throughout this paper that if F is Abelian then the isotropy 
factors S^lr are Z'^-invariant for all F < U^; for more general group actions this 
invariance holds only if F is a normal subgroup. 

We will assume familiarity with the product measurable space (Xi x X2 x ■ ■ ■ x 
Xrf, Tji ® ® ■ ■ ■ ® Sfi) associated to a family of measurable spaces (X^, Sj), 
i = 1,2, ... ,d. Given measurable maps ipi : Xi ^ Yi between such spaces we 
shall write V^i x ?/'2 x ■ ■ ■ x ^/^^ for their coordinate- wise product: 

Ipl X X ■ ■ ■ X iJd{xi, X2,..., Xd) ■■= (V'l(xi), ?/'2(x2), . . . , iJd{Xd))- 

More generally, if Ti'.TlI r\ (Xj, Sj) is an action for i = 1, 2, . . . , (i then we shall 
writeTixr2X- ■ ■xTrffortheactionZ'' r\ (Xi XX2 x • ■ ■ xX^, Y.x®Y.2®- ■ -^S^) 
given by 

(Ti X T2 X ■ ■ ■ X TdT := X T2" X ■ ■ ■ X T^. 

If all the Xj are equal to X, all the Yi to Y and all the -^i to then we shall 
abbreviate ?/'X^x---x^toV^^'^, ^iid similarly for actions. 

The construction that we later use for our proof of Theorem 1 1.1 1 will also require 
the standard notion of an inverse limit of probability-preserving systems; these 
are treated, for example, in Examples 6.3 and Proposition 6.4 of Glasner [lOJ. In 
addition to the results contained there, we need the following simple lemmas. 

Lemma 2.1 (Isotropy factors respect inverse limits). Suppose that 

(X,S,/i,T) = lim(X("),S(™\/i("\T(™)) 

is an inverse limit of an increasing sequence of U'^ -systems with connecting maps 
e^^^ : X(™') ^ X^'^^form' > m and overall projections : X X^™), and 
that F < Z""^. Then 

m>l 
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Proof It is clear that E^'r ^ q-^^ [(s('^))'r^™^lr] for every m > 1, and therefore 

that S^lr 3 \/m>i ^(ni) [ '^]' remains to prove the reverse inclusion. 
Thus, suppose that A G S is T|r-invariant. Then, by the construction of the 
inverse limit, for any e > we can pick some > 1 and some A;, E 6*^^^^^ [S'^'"^)] 
with n(AAAs) < e. This last inequality is equivalent to ||1^ — l^Jli < Since 
A is T|r-invariant it follows that — Ia^ ° T'^Wi < ^ for every 7 G F; hence, 
letting / be the ergodic average of under the action of T|r, we deduce that 
/ G L°°(yu|g-i r2(m)i), /isT|r-invariantand ||1a— /111 < £■ Now taking a level-set 

(m) I- -I 

decomposition of / yields T|r-invariant sets in 9'^^-^[T.^"^^] that approximate A to 

within e. Since e was arbitrary this shows that A lies in Vm>i ^(^)[(^*^"^'')^ '"'''^]' 
as required. □ 

Lemma 2.2 (Joins respect inverse limits). Suppose that (X, E, /i) zj' a probability 
space and that for each i = 1,2, . . . , k we have a tower of a-subalgebras sf^^ C 
C . . . C E. Then 



V (ss-^ V V . . . V -1-) ) = ( V ss-)) V ( V ■ ■ ■ V ( V 



,(m) 

m>l m>l m>l m>l 



Proof For every m > 1 we have 

V si™) V • ■ ■ V si™) C Y SS")) V ( V H^'")) V ■ • • V ( V 

m>l m>l m>l 

c Y (hS'") V s^'"^ ■ • ■ V si™) 

m>l 

and so taking the limit of the left-hand side above gives the result. □ 



3 The Furstenberg self -joining 

Central to many of the older ergodic-theoretic analyses of special cases of The- 
orem [TT| is a certain multiple self-joining of the input Z'^'^-system (X, S, fi, T). 
Given such a system and also a F0lner sequence {In)n>i and a base-point se- 
quence {aN)N>i we can consider the averages 

' " ' nGaiv+/iv j=l ' ' neajv+/jv i=2 
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and now in view of the right-hand expression above, if we know only the rank- 
{d— 1) case of Theorem 11.11 then we can deduce that these averages converge, and 
it is routine to show (using the standard Borel nature of (X, S)) that the resulting 
limit values define a probability measure fi*'^ on the product measurable space 
(X'^, S^'^) by the condition that 

/x*VixA2X...xA,):= lim ^ ^ [{[U^oT^df^, 

N-*oo Jx 

where we know that this is independent of the choice of (ajv) jv>i and {In)n>i- It 
is now also clear that this measure /i*'^ is invariant under the Z''-actions Si := T-^*^ 
for z = 1,2, ... ,d and also under 5^+1 := Ti x T2 x . . . x T^. We refer to 
{X'^, S®*^, /i*'^) as the Furstenberg self-joining of the space {X, S, jj,) associated 
to the action T, in light of its historical genesis in Furstenberg's work on the 
ergodic theoretic approach to Szemeredi's Theorem ([8]); note, in particular, that 
the one-dimensional marginals of fi*^ on (X, S) all coincide with fi. Given this 
self -joining, we shall write tti, 7r2, . . . , tt^ for the projection maps onto the d copies 
of (X, S, yu) that are its coordinate factors. 

In the sequel we will need to work simultaneously with the Furstenberg self- 
joinings of a system (X, E, /i, T) and an extension ip : (X, S, fl, T) (X, S, fi, T) 
of that system, in which case we can compute easily that the map ^^'^ identifies 
(X^ t'^'^, jl*^) as an extension of (X^ i:^^, /j*^), and we shall write ftu ^2, • • • , 
TTrf for the coordinate-projections of this larger self -joining. 

4 The proof of nonconventional average convergence 

We prove Theorem 11.11 by induction on d. As remarked above, the case d = 1 
is simply the von Neumann mean ergodic theorem, so let us suppose that d > 2 
and that the result is known to be true for all systems of at most d — 1 commuting 
Z*" -actions. 

4.1 Characteristic factors and pleasant systems 

As indicated in the introduction, we shall use a rather simple instance of the notion 
of 'characteristic factors': 
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Definition 4.1 (Characteristic factors). Given a system T : Z'''^ r\ {X, a 
sequence of characteristic factors for the nonconventional ergodic averages as- 
sociated to Ti, T2, . . . Td is a tuple (Hi, S2, . . . , S^) ofT-invariant a-subalgebras 
ofH such that 

n^aj^+Ij^ i=l n^aj^+Ij^ i=l 

in L'^{fi) as N oc for any /i, f2, ■ ■ ■ , fd ^ L^ilA' F0lner sequence {In)n>i 
and base-point sequence {aN)iy>i. 

Many previous results on special cases of Theorem 11.11 have relied on the identi- 
fication of a tuple of characteristic factors that could then be described quite pre- 
cisely, in the sense that they can be defined by factor maps of the original system to 
certain concrete model systems in which a more detailed analysis of nonconven- 
tional averages is feasible. Most strikingly, the analysis of Host and Kra in lfT3ll 
and Ziegler in [[20| show that for powers of a single ergodic transformation there 
is a single minimal characteristic factor (equal to all of the Hj above) that may be 
identified with a model given by a rf-step nilsystem, wherein the convergence of 
the nonconventional averages and the form of their limits can be analyzed in great 
detail. 

Here we shall not be so ambitious. Various examples show that for a suffi- 
ciently complicated system those functions measurable with respect to either 
or E^^^^i for some i = 2,3, . . . , d will behave differently (and, in particular, con- 
tribute nontrivially) should they appear as /i in our averages, and so we expect 
any tuple of characteristic factors to have Si D T,'^^ V Vf=2 S^'"^^- In order to 
explain our approach, let us first suppose that we are given a system in which we 
may actually take this to be our first characteristic factor, and may simply take 
:= S for i = 2, 3, . . . , d. 

Definition 4.2 (Pleasant system). We shall term a system (X, Tj,ijl,T) pleasant if 

d 
i=2 

is a tuple of characteristic factors. 
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Remark The idea of conditioning just one of the functions /j in our averages 
onto a nontrivial factor already appears in Furstenberg and Weiss [|9l, in whose 
terminology such a factor is 'partially characteristic'. < 

The main observation of this subsection is that, given convergence of noncon- 
ventional averages in general for systems ofd—1 actions, we can easily deduce 
that convergence for pleasant systems of d actions. Let us first record separately 
an elementary robustness result for nonconventional averages that we shall need 
shortly. 

Lemma 4.3. For any fi, f2, fd ^ (^^d N >lwe have 

\h 5: n/^°^iL<ii/iib -1111/^:1100. 



nGaj\l+Ij\l i=l i=2 

Proof This is clear from the termwise estimate 

d d d 

II n ° ^iL ^ ii/i ° ■ n 11/^ ° ^"n- = ha^ ■ n 

i=l i=2 i=2 

and the triangle inequality. □ 
Corollary 4.4. The nonconventional averages 

d 



'^1 ^ , r • 1 

n£apf+lpf t=l 

converge in L'^{fi) for the d-tuple of functions fi, f2, fd G L°°{n) if the 
corresponding averages are known to converge for all the d-tuples f["^\ f2, 
fdfor some sequence Z}™"* G L°°{^Ji) that converges to fi in L'^{fi). □ 

Proposition 4.5 (Nonconventional average convergence for pleasant systems). If 

T : Z^''^ r\ {X, S, /i) is pleasant and Theorem ll.lU s known to hold for all systems 
ofd — 1 commuting actions, then its conclusion also holds for {X, S, /i, T). 

Proof Writing S := S^^ V S^'=^\ Definition |411 tells us that 

^ d ^ d 

vn E n/^°^"-^ E (E.[/ii2]oTf)-n/.°^"-o 
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for any fi, f2,---,fd ^ L'^i.lA^ ^^'^ it suffices to prove the desired conver- 
gence under tlie additional assumption that fi is S-measurable. However, in this 
case we know that we can approximate /i in by finite sums of the form 

Z]f=ifi'i,fc ■ g2,k gd,k where gi^k e L°°{ii\^t^) and gi^k e L°° {ii\j^t^=t,) for 
i = 2,3, ... ,d. Hence by linearity and Corollary 14.41 it suffices to prove conver- 
gence for the averages obtained when fi is replaced by a single such product: 

d 

JTl E ((91-92 9d)oT,-)-llf,oT[^; 

\1n\ 

n6ajv+/jv i=2 

but now the different invariances that we are assuming for each gi imply that 

gi o = gi and gfj o T" = gi o T" for i = 2, 3, . . . , d and all n G Z*", and so the 
above is simply equal to 

n£aff+Iff i=2 

This is a product by the fixed bounded function gi of a nonconventional ergodic 
average associated to the d — 1 commuting actions T2, T3, . . .^T^, and we already 
know by inductive hypothesis that these converge in L'^{n). This completes the 
proof. □ 

Unsurprisingly, there are well-known examples of systems that are unpleasant: for 
example, the general rf-step nilsystems that emerge in the Host-Kra and Ziegler 
analyses are such. The simplest example from among these is the following: if Ra 
is an irrational rotation on (X, S, fi) := (T, Borel, Haar) and we set Ti := Ra, 
T2 := R2a = Tf, then we can check easily that TF^ = S^^ = j]Ti=T2 
trivial, but on the other hand if/2 G T\{lTr} and fi := /2 then fi and /2 are 
both orthogonal to the trivial factor but give 

. N N 

^ E hiTm)f2{Tm = ^ E /2wV2(t) ■ /2(«)'/2(2«) = ut) ^ o 

n=l n=l 

as X ^ 00. 

However, it turns out that we can repair this situation by passing to a suitable 
extension. 

Proposition 4.6 (All systems have pleasant extensions). Any 77'^ -system {X, T., ji,T) 
admits a pleasant extension ip : {X, S, jl, T) {X, S, /i, T). 
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From this point, Theorem 11.11 follows at once, since it is clear that the theorem 
holds for any system if it holds for some extension of that system. Proposition |4.6l 
forms the technical heart of this paper, and we shall prove it in the next subsection. 



4.2 Building a pleasant extension 

We shall build our pleasant extension using the machinery of Furstenberg self- 
joinings. By the remarks of Section [3l given the conclusions of Theorem 11.11 for 
systems of d— 1 commuting Z''-actions and a system T : Z'"'^ r\ {X, S, /i) we may 
form the Furstenberg self -joining {X'^, S®'^, /i*"^). Our deduction of pleasantness 
for our constructed extension will rest on the following key estimate. 

Lemma 4.7 (The Furstenberg self -joining controls nonconventional averages). If 

fi G L°°{^) is such that 



P d 



i=2 

for every choice of f2, f 3, fd G -^°°(/^) and of another function g G L°°(/i*'^||,^^d^s^+i ), 
then also 

n£aff+Iff i=l 

in for every choice of f2, fs, ■ ■ ■ , fd G L°°{lf) and any F0lner sequence 

{In)n>i and base-point sequence {aN)N>i- 

Remark Versions of this result have appeared repeatedly in previous analyses 
of more special cases of our main result; consider, for example. Proposition 5.3 of 
Zhang [fT8| or Subsection 6.3 of Ziegler [20J. The standard proof applies essen- 
tially unchanged in the general setting, and we include the details here largely for 
completeness. < 

Proof Suppose that /i, /2, • • • , /d G L^{^f) satisfy the assumptions of the the- 
orem. By the classical higher-rank van der Corput Lemma (see, for example, 
the discussion in Bergelson, McCutcheon and Zhang [3J) applied to the bounded 
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Z'^ -indexed family Y[t=i fi ° -^^(a*) we need only prove that 

rm.moPU.2 MV ' ^'nGaN+lN i=i 



= ^r E E I \{u^oTr■hoTnoTrd^^^^ 

as — s> oo and then Af — > oo. However, by the definition of the Furstenberg 
self -joining we know that 

ndaj^+I]^ ^ i=l 

n(/.-/.o7;"^™)ovr,d/z*^ 

1=1 

as — s> oo. Now, when we the averages these limiting values over nii and m2 G 
{1,2,..., My , we clearly obtain convex combinations of uniform averages over 
increasingly large ranges of m2—mi of the last expression above, and so appealing 
to the usual mean ergodic theorem for the Z^ -action Sdj^i := Ti x T2 x ■ ■ ■ x 
in L'^(fi*'^) we deduce that our above double averages converge to 

^ j=l me{l,2,...,M}'- i=l 

Setting 

mG{l,2,...,Af}'- j=l 

this is precisely an integral of the form that we are assuming vanishes, as required. 

□ 

We are now in a position to construct our pleasant extension. 

Proof of Proposition 14.61 We need to find an extension (X, S, /i, T) such that, 
setting 

S := S^i V S^2=Ti V ... V S'^''=^i, 
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we have 



for any /i, /2, /d G L°°{jl). By Lemma 14771 this will follow if we can 
guarantee instead that 

« d « d 

/ /io7^i-(r[/,ovf,)-^d/2*^= / E^[/i|s]offi. (rr/^oTf.) ■^d/i*'^ 

for every choice of /i, /2, . . . , /d G (/i) and of another function g E L°° {fl*'^\^^^^^Sa+ 

We shall show that this obtains for the inverse limit of a tower of extensions of 
(X, S, /i, T) constructed from the Furstenberg self-joinings themselves. 

Step 1: construction of the extension Given the original system (X, S, /i, T) 
we define an extension : (X^, S^, /i^, T^) ^ (X,S,/i,T) by setting 
(X(^), := (X'^, S®'^, /i*'^), V^^) := tti and with the Z^-actions 

J-d ■- ^d 

(note that we lift Ti to 5^+1, rather than to 5*1). We may now iterate this con- 
struction on the systems that emerge from it to build a whole tower of extensions 
(X(™),S("),/i("^),T(")) (X(™-i),S(''"-i),/i(™-i),T(™-i)) for m > 1, where 
we set (XW, S(o), ^(0), T(o)) := (X, S, /x, T). Note that since each (X(™+i), /i( 
is the (i-fold Furstenberg self -joining of {X^"^\ addition to the fac- 

tor map ir^^ given by the projection onto the first coordinate in this self-joining 
it carries d — 1 other such maps corresponding to the projections onto the other 
coordinates; let us denote these by ip3^\ • • • , "iplT^ ■ 

We will take (X, S, fi, f) to be the inverse limit lim^^(X('"), S(™), T^"^)), 
and show that this has the desired property. Write if) : X ^ X for the overall 
factor map back onto the original probability space, 6'|™^'* : X*^™') — > X*^™) for 
the connecting projections of our inverse system, and also 6(^m) ■ X — > X^™) for 
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the overall projection from the limit system, so that ifj = 6{Qy Write -k^' for 
the coordinate projections i^X^^'^Y ^ X^™) and vTj for the coordinate projections 
X'^ X. Finally, let 

and 

S := V S^2=ri y • • . V S^''=^i; 
combining Lemmas |2 . 1 1 and [Z2l we deduce that S = Vm>i ^(ni) '''"'']• 

We can depict the tower of systems constructed above in the following commuta- 
tive diagram: 



9(3) 

■^(2) 



(3)-,xd 



(X(2),S(2),;i(2)^ 



((X(2))'^,(S(2))®'^,(;x(2))*'^) 



n(2) 
(1) 



9(2) )Xd 



.(1) 



((X«)^(S«)®^(/i«)*'^) 



(e{i))> 



(X,E,/i) 



where, in addition, by construction we have 

^j^(m+l)^ ^(m+l)^ ^(m+l)^ ^ ( (X'-™'' )'^, (S'-™-' )'^'^, (/i*^™^ ) *'^) 



for very m > with the actions 7^^™+^^ selected from among the S^'"' as above, 
and under this identification the maps o'f^t^^ and 7r|™^ agree. On the other hand, 



(m) 



the maps 7r|™^^'' and (^^(™)^"'^')^'^ do not agree. 



(m) 
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Step 2: proof of pleasantness We will now prove that for any fi, /2, • • • , /d e 

L~(/i) and^ e L^{il*'\^^^,^s,J we have 

P d „ d 



By continuity in L'^^jJi) and the definition of inverse limit, we may assume further 
that there is some finite m > 1 such that fi = fi o 9(m) and g = g o 9'^^-^ for 
some /i, /2, . . . , /d e L°°(/i(™)) and g e <m) ). Given this the 

left-hand expression above can be re-written at level m as 

/ /i0 7rl-)-(n/,ovrf))-5d(/.(-)r^. 

J(X(m))d / 



Foranym' > m, since ((X^™'))'', (S^™))^"^, (/x^™'))*'^) =: E^'^+i), //("*'+^)), 

the left-hand side above can also be re-written as 



• ( n(/^ ° ° ^f'^'^ ° dim'-,!)) ■ {g O (^gy ^ O ^(^,^,)) d/i. 

Now, the function {fi o O^^-j*) o ip'^'^ """^^ is invariant under the Z'' -action 

. .xidx- • •xr/'"')(Tj™'V =: t/'"'+')(Ti(™'+'))-^ 

for each i = 2,3, . . . ,d, and the function o {6^^^^) ^'^ is invariant under S^J^^ =: 
j,^(m +1)^ ^^^^ integral above all factors save the first are Oj^^,j^-^^ \^i'^'+^)y 

measurable, and so we may condition /i o 9^^-^^^ onto S^'" "'"^) and conclude 
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overall that 



^ i=2 ^ 



i=2 

Since 

E[/i ° I s^-')] o e^^,) E[/i o I S] 

and hence 



E[Ao^{-:+^) I 5(-'+^)]o^(^,+i)-E[Ao^{-:) I H(-')]oe(„,) ^ mL\il) asm' ^ oo, 



we next deduce that 



jE[/,o^j:;))is(-')]oV)) 



i=2 





as m' — > cxD, and by the law of iterated conditional expectation this last expression 
is equal to 

i=2 

However, by exactly analogous reasoning to that above applied with m' in place 
of m and the collection of functions E[/i o ^1™^-* | 5^"*')] o ^(^'), fi o 6(^m) — {h ° 
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^t)) ° ^(-') for ^ = 2, 3, . . . , and ^ o = {g o {9^^^r') o 9^^^,^ we deduce 
that this is equal to 

(E[/i o I S] o • ( n ° ° ^0 ■ ^ ° 

1=2 

as m' — > oo, as required. □ 

It is clear that the assertion of Theorem II .11 must hold for any system if it holds 
for some extension of that system, and so, as remarked previously, it now follows 
in full generality by combining Proposition |43] and Proposition l4.6[ □ 

Remarks Intuitively, at each step in our iterative construction of the tower of 
extensions 

(X,S,/i,T) ^ (X«,S«,/i«,T«) ^ (X(2),S(2),^(2),T(2)) ^ ... 

we are introducing a new supply of functions that are invariant under either 
or can contribute to building a conditional expectation of /i that 

will serve as a good approximation to it for the purpose of evaluating our integral. 
However, at each such step we introduce new functions on the larger system that 
we will also then need to handle in this way, and these will not be taken care of 
until the next extension. It is for this reason that the present construction relies on 
the passage all the way to an inverse limit. 

Considering informally how the pleasant extension enables us to bring the proof 
of Proposition 14. 51 to bear on a more general system, we can locate the concrete 
appearance of the extension (X, S, fl, T) when we approximate /i by Yl!k=i d^M ■ 

(72, fe gd,k'- the point is that while this sum overall approximates a function on 

the smaller system (X, S, yU, T), the individual functions gi that appear within it 
do not, and then when we separately replace composition with T" by T" for these 
functions this requires us to keep track of their individual orbits inside L°°{jl), 
which will in general not be confined to L°° (/i) . < 
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5 Discussion 



5.1 Alternative constructions of the extension 

The scheme we have adopted to construct our pleasant inverse limit extension 
T) of (X, T) is far from canonical. In particular, there is more 
than one way to use some self -joining of {X, /i) built using the original transfor- 
mations T to control the convergence of nonconventional averages, as we have 
done with the Furstenberg self -joining via Lemma 14771 While this choice seems 
particularly well-adapted to giving a quick inductive proof of Theorem ll.li it may 
be instructive to describe briefly an alternative such self -joining that could be used 
in a similar way. This is a simple generalization of the space (X^'^, H^^, n^'^) con- 
structed by Host and Kra for their proof in [[T3il of Theorem 11.11 in the case of 
powers of a single transformation. 

Given our original system {X, T., fi,T), we construct a sequence of self -joinings 
(XW,EW,/iW,TW), (X[2l,sM,;x[2l,TP]), (X^, Sl'^l, Tl'^l). where each 
(XW, SW, /iW, TW) is a 2^-fold self-joining of (X, S, fx, T), iteratively as follows. 
First set (X!^l, Sl^l) := (X^, S®^) ^nd let fi^^^ be the relatively independent self- 
joining fi ®Y,T-^ fiof n over the isotropy factor (see, for example. Section 6. 1 of 
Glasner [lOJ for the general construction of relatively independent self -joinings). 
In addition, lift Ti to Ti x idx and Ti to 7^'^^= x for i = 2, 3, . . . , d. It is clear 
from our construction that these preserve /x'^l Finally, let vri be the projection of 
X^ onto the first coordinate. Now to form (X'^^, S^^l, /xt^l, T'^]) we apply this con- 
struction to the system (X^^l, S^^l, /xt^l, T'^^) but taking the relatively independent 

self-product of /i'^^ over the different isotropy factor S^i ^ and lifting t|^' to 
tP X Tj^l and to x for i = 2, 3, . . . , rf. We continue iterating this 
construction, at each step forming (X^'^l, S^'^l, T by taking the relatively 

independent self -product over S^i ' and lifting t|'^^^' to t|''^^' x t]''"^' 

and ^' to t}'' x t]'' for i = 2,3, . . . ,d, until we reach k = d. This gives 
the Host-Kra self-joining. Our convention is to index the 2'^-fold product X^'^ 
that results by the power set V[d] (the set of all subsets of {1,2,..., d}), so that 
Xld] = X^l'^l, in such a way that X'^] corresponds to the factor X^'^'^^^^ of this 
larger product, X^^l to the factor X^'^'^^^'^'^^'^^''^^\ and so on. In addition, we write 
vtq ' for the 2^^ coordinate projections X^t^^^ — > X. We can now easily concatenate 
the above specifications to write out the resulting transformations in terms of 
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the original Tf T? = n„eP[d] with 

Ti ifa = 
Ti,„ := { idx ifa = {1} 

Ti if max a = i for « = 2, 3, . . . , (i, 

and Tf^ is simply T."'^''^' for i = 2, 3, . . . , d. 

This can serve as an alternative to the Furstenberg self -joining in light of the fol- 
lowing lemma: 

Lemma 5.1 (The Host-Kra self-joining controls nonconventional averages). If 

fi G L°°{n) is such that 

I /iO7r0-( n o O d/il'^l = 

for every choice of fa G L°°{jj) for a ^V[d] \ {0}, then also 

^ N d 
n=l i=l 

in L'^{fj.)for every choice 0//2, fa, ■ ■ ■ , fd & L°°{f^)- 

Proof This follows essentially by d times applying alternately the van der Cor- 
put estimate, just as in the proof of Lemma 14771) . and then the Cauchy-Schwarz 
inequality for the space L'^iff). The argument is just as for the case of powers of a 
single transformation treated by Host and Kra in [13] (see their Theorem 12. 1 and 
the construction of Section 4), and we omit the details. □ 

Writing (X^^), S^^), /x^^), T^^)) := (Xl^^, S^"'], /^l^], Tl^l) we can now use the ma- 
chinery of Host-Kra self -joinings to build a tower of extensions of (X, S, yU, T) 
and deduce that their inverse limit is pleasant, as we did using the Furstenberg 
self -joining in Proposition 14.61 This requires grouping together the various fac- 
tors in the integrand of 
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according to the partition V[d] \ {0} = ljf=i{'^ • max a = i], noting that the 
above explicit description of Tf^ tells us that /{i} o tx^^^^ is t]"^^ -invariant and that 

n 

a: maxa=j 

is T|'^^(T/'^')~^-invariant for i = 2,3, ... ,d. The remaining details of the argu- 
ment are almost identical to those for Proposition |4.6[ We note that in this argu- 
ment the one-step extension {X^^\ T.^^\ ix^^\T^^^) is already the top member of a 
height-c? tower of self-joinings. These two towers serve different purposes in the 
proof, and should not be confused: the d smaller extensions used to build up to 
{X'^^\ S'^^), T^^)) correspond to the d appeals to the van der Corput estimate 
during the proof of Lemma [5TT1 

The choice between the Furstenberg and Host-Kra self -joinings certainly affects 
the structure of the pleasant extension that emerges, but seems to make little dif- 
ference to the overall complexity of the proof, since we do not exploit any of this 
more particular structure. The advantage of the Host-Kra self-joining is that it 
does not require an iterative appeal to Theorem 1 1.1 1 for its proof, but on the other 
hand that is traded off into a more complicated, alternating appeal to the van der 
Corput estimate and the Cauchy-Schwarz inequality in the proof of Lemma 15. 1[ 
rather than the simple single application made to prove Lemma 14771 

Looking beyond the above considerations, it may be interesting to search for a 
quicker way to pass directly to a pleasant extension: 

Question Can we construct a pleasant extension {X, S, jl, T) in a finite number 
of steps, without invoking an inverse limit? < 

Remark Since a preprint of this paper first appeared, Bernard Host has shown 
in [fm that by using the above Host-Kra self -joining, one iteration of the above 
construction suffices to produce a pleasant system: the passage to the inverse limit 
is already superfluous! His proof of this requires a slightly more delicate analysis 
than the work of our Subsection 14. 2[ but in fact it seems likely that it applies 
equally well to both self-joinings. < 

5.2 Possible further questions 

During the course of proving Theorem 11.11 we have made essential use of the 
commutativity of 17 , in addition to the commutativity of the different actions Ti, 
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T2, . . . , Td- It is possible that our theorem could be generalized by considering the 
averages 



for d commuting actions Ti, T2, . . . ,Td on (X, S, /i) of a more general amenable 
group r with a F0lner sequence {In)n>i and base-point sequence {aN)N>i. In 
this case, if we mimic our straightforward construction of the Furstenberg self- 
joining, we obtain a measure fi*'^ on X'^ that is Ti x T2 x . . . x Td-invariant, but 
it may not now be invariant under any of the diagonal actions T-^*^. It seems that 
that ideas of the present paper cannot yield this stronger result (if it is true at all) 
without some additional new insight. 

Another generalization of Theorem 11.11 has been conjectured by Bergelson and 
Leibman in [|2j|: 

Conjecture (Nilpotent nonconventional ergodic averages). IfT -.T r\ (X, S,/x) 
is a probability-preserving action of a discrete nilpotent group T and 71 , 72 , . . . , 7^ G 
r then for any fi, f2, ■ ■ ■ , fd G L°°{fi) the nonconventional ergodic averages 



converge to some limit in L'^ifJ^)- 

I do not know whether the methods of the present paper can be brought to bear on 
this conjecture; it seems likely that considerable further new machinery would be 
needed here also. 

In a different direction, it is unknown whether Theorem 11.11 holds with pointwise 
convergence in place of convergence in L^(yu). The methods of the present paper 
seem to contribute very little to our understanding of this problem; crucially, while 
the Furstenberg self -joining allows us to prove that fi — E^[/i | S] contributes 
negligibly to the L^(/i) convergence of our averages inside the extended system, 
so that we can replace /i with E^[/i | S], we currently know of no good way to 
control this approximation pointwise, as would be essential for any approach to 
the question of pointwise convergence using the machinery of pleasant extensions 
and their factors. 





N 



d 



n=l i=l 
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